Classification Using Statistically Significant Rules

نویسندگان

  • FLORIAN VERHEIN
  • SANJAY CHAWLA
  • Florian Verhein
  • Sanjay Chawla
چکیده

Classification based on association rule mining has become a popular technique within the data mining community. However, it has now been emphatically shown that association rules generated solely on the basis of support and confidence are often not statistically significant i.e, the rules generated are artifacts of the particular dataset being mined rather than a relationship inherent in the underlying population (process). This is not surprising because the use of support is driven by its computational and not statistical properties. In this paper we show that mining for statistically significant rules in a classification setting, by “forcing” Fisher’s Exact Test or its continuous approximation to be “anti-monotonic”, results in a) the vast majority of the mined rules being statistically significant by definition, and b) comparable classification performance on balanced datasets and higher performance on imbalanced datasets. All while examining on average only 0.5% of the search space, using 0.4% of the time and finding 0.06% of the number of rules as techniques using the support-confidence framework. We also provide additional evidence against support and confidence – primarily that they are biased in imbalanced datasets. Thus one arrives at an inescapable conclusion: classification based on rule mining by support and confidence thresholds is not necessary, not efficient and perhaps misleading.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting statistically significant dependent rules for associative classification

Established associative classification algorithms have shown to be very effective in handling categorical data such as text data. The learned model is a set of rules that are easy to understand and can be edited. However, they still suffer from the following limitations: first, they mostly use the support-confidence framework to mine classification association rules which require the setting of...

متن کامل

USING DISTRIBUTION OF DATA TO ENHANCE PERFORMANCE OF FUZZY CLASSIFICATION SYSTEMS

This paper considers the automatic design of fuzzy rule-basedclassification systems based on labeled data. The classification performance andinterpretability are of major importance in these systems. In this paper, weutilize the distribution of training patterns in decision subspace of each fuzzyrule to improve its initially assigned certainty grade (i.e. rule weight). Ourapproach uses a punish...

متن کامل

A hybridization of evolutionary fuzzy systems and ant Colony optimization for intrusion detection

A hybrid approach for intrusion detection in computer networks is presented in this paper. The proposed approach combines an evolutionary-based fuzzy system with an Ant Colony Optimization procedure to generate high-quality fuzzy-classification rules. We applied our hybrid learning approach to network security and validated it using the DARPA KDD-Cup99 benchmark data set. The results indicate t...

متن کامل

GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION

This paper considers the generation of some interpretable fuzzy rules for assigning an amino acid sequence into the appropriate protein superfamily. Since the main objective of this classifier is the interpretability of rules, we have used the distribution of amino acids in the sequences of proteins as features. These features are the occurrence probabilities of six exchange groups in the seque...

متن کامل

School of IT Technical Report USING SIGNIFICANT, POSITIVELY ASSOCIATED AND RELATIVELY CLASS CORRELATED RULES FOR ASSOCIATIVE CLASSIFICATION OF IMBALANCED DATASETS

The application of association rule mining to classification has led to a new family of classifiers which are often referred to as “Associative Classifiers (ACs)”. The advantage of ACs is that they are rule-based and thus lend themselves to an easier interpretation. Another advantage that ACs enjoy is that they are based on a global search criterion, unlike other rule-based classifiers – e.g. d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007